A discretization algorithm based on Class-Attribute Contingency Coefficient
نویسندگان
چکیده
Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster. In this paper, we propose a static, global, incremental, supervised and top-down discretization algorithm based on Class-Attribute Contingency Coefficient. Empirical evaluation of seven discretization algorithms on 13 real datasets and four artificial datasets showed that the proposed algorithm could generate a better discretization scheme that improved the accuracy of classification. As to the execution time of discretization, the number of generated rules, and the training time of C5.0, our approach also achieved promising results. 2007 Elsevier Inc. All rights reserved.
منابع مشابه
Bayesian Models to Assess Risk of Corruption of Federal Management Units
This paper presents a data mining project that generated Bayesian models to assess risk of corruption of federal management units. With thousands of extracted features related to corruptibility, the data were processed using techniques like correlation analysis and variance per class. We also compared two different discretization methods: Minimum Description Length Principle (MDLP) and Class-At...
متن کاملA Novel Tree Based Classification
Classification is a data mining (DM) technique used to predict or forecast the unknown information using the historical data. There are many classification techniques. ID3 is a very popular tree based classification algorithm for a categorical data which does not support continuous data. Attribute selection process plays major role in building a classification tree model. Attribute Selection in...
متن کاملFast Voltage and Power Flow Contingency Ranking Using Enhanced Radial Basis Function Neural Network
Deregulation of power system in recent years has changed static security assessment to the major concerns for which fast and accurate evaluation methodology is needed. Contingencies related to voltage violations and power line overloading have been responsible for power system collapse. This paper presents an enhanced radial basis function neural network (RBFNN) approach for on-line ranking of ...
متن کاملFast Class-Attribute Interdependence Maximization (CAIM) Discretization Algorithm
Discretization is a process of converting a continuous attribute into an attribute that contains small number of distinct values. One of the major reasons for discretizing an attribute is that some of the machine learning algorithms perform poorly with continuous attribute and thus require front-end discretization of the input data. The paper describes a Fast Class-Attribute Interdependence Max...
متن کاملAmeva: An autonomous discretization algorithm
This paper describes a new discretization algorithm, called Ameva, which is designed to work with supervised learning algorithms. Ameva maximizes a contingency coefficient based on Chi-square statistics and generates a potentially minimal number of discrete intervals. Its most important advantage, in contrast with several existing discretization algorithms, is that it does not need the user to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Sci.
دوره 178 شماره
صفحات -
تاریخ انتشار 2008